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Abstract —In this paper we tackle the issue of clustering 
trajectories of geolocalized observations. Using clustering technics 
based on the choice of a distance between the observations, we 
first provide a comprehensive review of the different distances 
used in the literature to compare trajectories. Then based on 
the limitations of these methods, we introduce a new distance : 
Symmetrized Segment-Path Distance (SSPD), We finally compare 
this new distance to the others according to their corresponding 
clustering results obtained using both hierarchical clustering and 
affinity propagation methods. 

Index Terms —Trajectory clustering 

Introduction 

TRAJECTORY is a set of positional information for a 
moving object, ordered by time. This kind of multidi¬ 
mensional data is prevalent in many fields and applications, 
for example, to understand migration patterns by studying tra¬ 
jectories of animals, predict meteorology with hurricane data, 
improve athletes performance, etc. Our study is concentrated 
on vehicle trajectories within a road network. The growing 
use of GPS receivers and WIFI embedded mobile devices 
equipped with hardware for storing data enables us to collect 
a very large amount of data, that has to be analyzed in order 
to extract any relevant information. The complexity of the 
extracted data makes it a difficult challenge. In this context, 
the goal of this work is to construct, in a data driven way, 
a collection of trajectories that will model the behaviors of 
car drivers. These models will be learned from a data set of 
time-stamped locations of cars. We focus in this work at the 
clustering of trajectories of vehicles. The natural application 
of this work is the forecast of the destination of drivers 
according to the shape of their trajectories. To achieve this 
goal, the first step is to cluster trajectories having similar paths. 
This clustering is based on comparison between trajectory 
objects. This requires a new definition of distance between 
these objects which are studied. 

A large amount of work has been done to give new 
definitions of trajectory distance. Tiakas et < 2 /. (2009(11) , Rossi 
et al (201213), Han et (2/.(2012(3) or Hwang et aL(2005^) 
propose road network based distances. They assume that the 
trajectories studied are perfectly mapped on the road network. 
However, this task is strongly dependent on the precision of 
the GPS device. When the time interval between two GPS 
locations is significant, several paths on the graph are possible 
between locations, especially when the network is dense. 


Moreover it requires the knowledge of the road network. 
Here, we focus on completely data driven methods without 
any a priori information. Several methods have been used 
to cluster data set of trajectories. Clustering methods using 
the Euclidean distance lead to bad results mainly due to the 
fact that trajectories have different lengths. Hence, several 
methods based on warping distance have been defined , Bemdt 
(19940), Vlachos et al (20020), Chen et al. (20040), and 
Chen et al. (2005 0). These methods reorganize the time 
index of trajectories to obtain a perfect match between them. 
Another point of view is to focus on the geometry of the 
trajectories, in particular on their shape. Shape distances like 
Hausdorff and Frechet distances can be adapted to trajectories 
but fail to compare them as a whole. Lin et al. (2005(91) 
proposed a method based exclusively on the shape of the 
trajectory but at high computational cost. 

In section |n| of this paper several distances are studied and 
compared. A new distance will be presented in section |I^ the 
Symetrized Segment-Path Distance (SSPD). SSPD is a shape- 
based distance that does not take into account the time index 
of the trajectory. It compares trajectories as a whole, and is 
less affected by incidental variation between trajectories. It 
also takes into account the total length, the variation and the 
physical distance between two trajectories. To evaluate our 
distances, and compare them to others, clustering results of 
some trajectory sets are analyzed in section |V| 

I. Model for trajectory clustering 
A. Trajectory 

A continuous trajectory is a function which gives the 
location of a moving object as a continuous function of time. 
In our case we will only consider discrete trajectories defined 
here after. 

Definition 1. A trajectory T is defined as 

where pfc G ifc G M Vfc G [1... n], Vn G N and n is the 
length of the trajectory T. 

The exact locations between time ti and are unknown. 
When these locations are required, a piece wise linear repre¬ 
sentation is used between each successive location pi and 
resulting in a line segment Si between these two points. This 
new representation is called a piece wise linear trajectory. In 
this representation, no assumption is made about time indexing 
of segment Si. 



2 


TABLE I: Notation 


r 

The set of trajectories 

rpi 

The trajectory of set T 

rpZ 

^pl 

The piece wise linear representation of T* 

n» 

Length of trajectory 

nil 

Length of the T^ 

Pi 

The location of 

E 

The set of continuous points that compose 


The line segment between p*. and p\^i 

ti 

The time index of location 

WPkPih 

The Euclidean distance between p/j. and pi 


Definition 2. A piece wise linear trajectory is defined as Tpi 
: ((si),..., (s^_i)) , where and ripi is the length of 

the trajectory. 

The length of the trajectory Upi is the sum of the lengths of 
all segments that compose it : ripi = \\PiPi+ih- 

The notation used in this paper are summarized in Table |I] 

B. Distance 

There are many ways to define how close two objects are far 
one from another. Beyond the notion of mathematical distance, 
many functions can be used to qualify this dissimilarity. The 
terminology used in literature to define them is not completely 
standardized. Therefore we will use the definition established 
in Deza et al (2009 l(T0l| ) as a reference. 

Definition 3. Let T be a set of trajectories. A function d : 
T 'xT ^ IZis called a dissimilarity on T if for all T^, G 

r: 

. d(T\T2) > 0 
. d(T\T2) =d(T2,Ti) 

. d(T\Ti) =0 

If all of these conditions are satisfied and d(T^,T^) = 
0 = T‘^ d is considered to be a symmetric. If 

the triangle inequality is also satisfied, d is called a metric. 
These notations are summarized in Table 

X indicates the required properties for each distances, while 
* indicates properties that are automatically satisfied (by the 
presence of the other required properties for the metric). 

C. Desired properties of clustering and distances 

Our aim is to be able to predict the most probable next 
location of a moving object given few location data points. 
This prediction should be based on groups of past trajectories 
that have been gathered together sharing a similar behavior. 
Hence we aim at finding a clustering method that should 
regroup trajectories 

• with similar shape and length 

• which are physically close to each other 

• which are similar as a whole with more than just similar 
sub-parts 

• all of these properties should be considered without 
regard to their time indexing 

Moreover we want to design a very general procedure able 
to treat all trajectories data, without a prior knowledge on 


the particular geographical location where they are collected. 
To obtain such clustering, the issue of this work is to find a 
distance that respects such properties and succeed in extracting 
these features. Actually, the desired distance should have the 
following properties, 

• it compares distances as a whole 

• the compared trajectories can be of different lengths, 

• the time indexing can be very different from one trajec¬ 
tory to another 

• the trajectories can have similar shapes but can be phys¬ 
ically far from each other and vice versa 

• extra parameters should not be required. 

II. Distance on trajectories: a review 

Three main kind of distances have been introduced in 
the literature. The first uses the underlying road network, 
Network-Constrained Distance. Theses distances will not be 
detailed in this paper. They assume that the road network 
is known and that trajectory data are perfectly map on it. 
Distances that do not use the underlying road network can also 
be classified into two categories: those who only compare the 
shape of the trajectory, Shape-Based Distance and those who 
take into account the temporal dimension; Warping based 
Distance. 

Performance of clustering algorithms using these distances 
will be compared section |T| as well as their computation cost 
and their metric properties. 

A. Warping based Distance 

Euclidean distance, Manhattan distance or other L^-norm 
distances are the most obvious and the most often used 
distances. They compare discrete objects of the same length. 
They can be used to look for common sub-trajectories of 
a given length but they can not be used to compare entire 
trajectories. Moreover, these distances will compare locations 
with common indexes one by one. At a given index i, location 
p\ of trajectory will be compared only to location p? 
of trajectory T^. However, these locations can be strongly 
different according to the speeds of the trajectories. Hence, 
it makes no sense to compare them without taking this into 
account. This problem is also common in time series analysis 
and not only in trajectory analysis. 

Warping distance aims to solve this problem. Eor this pur¬ 
pose, they enable to match locations from different trajectories 
with different indexes. Then, they find an optimal alignment 
between two trajectories, according to a given cost ^ between 
matched location. Several warping based distance have been 
defined. DTW (Berndt et al, (1994 0)) and later LOSS 
(Vlachos et al, 2002E1), EDR (Chen et al, 2005|[8l) and 
ERP (Chen et al, 2005 1[7|). These distances are defined the 
same way, but they use different cost functions. 

In order to define a warping distance, two compared time 
series trajectories, T% TA are arranged to form a n* x grid 
G. The grid cell, gk,i, corresponds to the pair {p\,p]). 

Definition 4. A warping path, W = wi^... ^w\w\, crosses the 
grid G such that 













3 


TABLE II: Metric Definition 


Property ——— 

-Metric Name 

# y / 

Non-Negativity D(T^, T^) > 0 

Symmetry d{t^ ,T‘^) = D{T‘^ ,T^) 

Reflexivity D{T^ = 0 

Triangle Inequality < D{T^ ,T‘^) + D(T^,T^) 

Identity of indiscernible D{T^ ,T‘^) = 0 

X X * 

X XX 

X 

X 

X X 


• wi= gi^i, 

• 'WlWl = 9n\n0, 

• ifwk= gki,kj, then wu+i is equal to gki+i,kj, gki,kj+i 

gki-\-\,kj-\-\' 

The order of the locations in a trajectory are maintained 
but they can be repeated, deleted or replaced by an arbitrary 
value, a gap, along the warping path. The distance is then 
computed by minimizing or maximizing the sum of a given 
cost 6 between all pair of locations that make a warping path 
W, among all existing warping path. 


Definition 5. A warping distance is defined as 


D{T\T^) 


or = 


mmw 

mdixw 


eL=Uk) 


(1) 


where S{wk) = 5{gki,kj) = ^{Pk^iPk)^ cost function 

and W is a warping path. 


They are generally computed by dynamic programming. 
Table |I^ displays the cost functions as well as the dynamic 
formulation of these distances. 

On contrary to the three other distances, LOSS is a 
similarity. The exact similarity used in Vlachos et al , 2002(6) 
is S(T^,T^) = which is between 0 and 1. We 

will then use the distance 


DLCSS{T\T^) = 1 - S{T\T^), 

to compare distances to each other. 

The metric types of these distance functions, and 
computational cost for the four methods are summarized in 
table HV] 


1) Comparisons: 

• All of these distances handle local time shifting. 

• The cost function 5 uses the Euclidean distance. Some 
of this distances have been defined using a LI-norm, but 
Euclidean distance is more adapted for real values. 

• LOSS and EDR's cost function count the number of 
occurence where the Euclidean distance between matched 
location does not match a spatial threshold, Sd- The 
former counts similar locations, the latter the difference. 
This threshold makes the distance robust to noise. How¬ 
ever, it has a strong infiuence on the final results. If the 
threshold is large, all the distances will be considered 
similar and if low, only those having very close locations 
will be considered similar. 


• In comparison, ERP and DTW put a weight to these 
differences by computing the real distance between the 
locations. In this sense they can be viewed as more 
accurate. 

• ERP is the only distance which is a metric regardless 
of the Lp norm used, yet it works better for normalized 
sequences, especially for defining the gap value g. It does 
not apply for vehicle trajectories. 

• In addition, these distances may include a time threshold, 
St. Thus, two locations will not be compared if the differ¬ 
ence between their time indexing is too large. However, it 
is very hard to estimate the value of this threshold when 
comparing trajectories due to the presence of noise. 

2 ) Pros and Cons: The main advantage of these distances is 
that they enable comparison of sequences of different lengths. 

The two main limitations of warping based distance are the 
following 

• Warping methods are based on one-to-one comparison 
between sequences. Hence, it often requires the choice 
of a particular series that will be used as a reference, 
onto which all other sequences will be matched. The 
indexed of two sequences that are compared should be 
well balanced in order to capture best the variability. 
Eor instance to detect if there were accelerations and 
decelerations during the measurement of the time series. 
Hence the choice of the reference sequence is very 
important. 

• The performance of usual methods based on warping 
techniques is hampered by the large amount of noise 
inherent to road traffic data, which is not the case when 
studying time series. 

Instead of correcting the time index, the solution is to use 
distances that have the effect of time removed. 


B. Shape-Based Distance 

These distances try to catch geometric features of the 
trajectories, in particular, their shape. Among Shape-Based 
Distances, the Hausdorff distance (Hausdorff, 1914 CD), 
and the Erechet distance (Erechet, 1906(121) are likely the 
most well known. 

1) Hausdorff: The Hausdorff distance is a metric. It mea¬ 
sures the distance between two sets of metric spaces. Infor¬ 
mally, for every point of set 1, the infimum distance from this 
point to any other point in set 2 is computed. The supremum 
of all these distances defines the Hausdorff distance. 
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TABLE III: Re-Indexing based distance definition 



Cost function 

^NAMe{P1',P2) = 

Distance 

NAME{T\T3) = 

DTW 

lblP2||2 


0 if = n-^ = 0 

00 if = 0 or P — 0 

^DTw{p\,p{)+ 

r DTW{rest{T^),rest{T^)), >| 
mini DTW{rest{T^),T^)), \ 

i DTW{T\rest{T3) J 

00 

00 

U 

hJ 

( 1 \f\\piP 2 \\ 2 <ed 

<0 if Pi or p 2 is a gap 

1 0 otherwise 


^ 0 if = 0 or n-^ = 0 

LCSS{rest{T^),rest{T^)) + Slcss{p\,Pi) if 5lcss{p\:P{) = 1 

+ Acssi^UaA ] ..henvise 

\ LCSS{T\rest{T^)) + SLCSs{gap,Pi) J 

EDR 

r 0 if ibiP2ii2 < 

<1 if Pi or p 2 is a gap 

1 1 otherwise 


" if n-^ = 0 

n-i if n* = 0 

EDR{rest{T'^),rest{Ti)) if 5edr{p\,p{) — 0 

r EDR{rest{T^),rest{T^)) + Sedr{Pi,Pi), ] 

min < EDR{rest{T^),T^)) + 5edr{p\, ga-p), I” otherwise 

, ^ EDR{T\rest{T3) + SEDR{gap,p{) i 

Oh 

W 

{ lbiP 2||2 if Pi, P 2 are not gaps 

{ WPigh if P 2 is a gap 

[ \\gP 2\\2 if pi is a gap 

= < 

Efcli Ibfc^lb if = 0 

Er=ilbz^ll2 ifn"=0 

r ERP{rest{T^),rest{T^)) + SERp{p\,pi), ] 

min < ERP{rest{T^),T^)) + 5erp{p\, gap), / otherwise 

ERP{T\rest{T^) + 5ERp{gap,p{) J 


TABLE IV: Re-Indexing based distance properties 


Name 

Metric Types 

Computation 

Cost 

DTW 

symmetric 

XP) 

Less 

distance 

O(n^) 

EDR 

symmetric 

0{n^) 

ERP 

metric 

o\n^) 


Definition 6. The Hausdorjf distance between two sets of 
metric spaces is defined as 

Haus{X,Y) =max{sup inf ||x^|| 2 ,sup inf ||x^|| 2 }. 

xexy^^ vertex 

This distance is complicated and resource intensive to 
calculate when applied to most existing sets. But in the case of 
polygonal curves like trajectories, some simplification can be 
made due to the monotonic properties of a segment. Distance 
from a point p to a segment s is defined as follows. 

Definition 7. Point — to — Segment distance. 

D s^) = i ¥plresf,, 

pH 12 ) I min(||p,\p?J| 2 , otherwise. 

Where is the orthogonal projection of p\^ on the 

segment 

Hence, the Hausdorff distance between two line segments 
is 

D Haussdorfi^^i^i ~ SUp^^^i^ Dps{p^ 

Dps{p^ 4i)} 

= max{ Dps {pj ^, J, Dps (p,\ +1 , sf J, 
^ps {Pi2 ’ 1 ) ’ ^PS {Pi2 +1 ’ 1 ) } • 

Indeed, a segment is monotonic. As seen in Eig. the 
supremum of the Point — to — Segments distance from any 
points of a segment to a segment occurs at one of the 
end points of the segment . The Hausdorff distance between 
two trajectories can then be computed with the following 
formula. 



Eig. 1: Supremum of Point — to — Segment distance from 
point of segment sj to segment s\ 


Definition 8. Hausdorff distance between two discrete trajec¬ 
tories. 

= max { max {Dpsiph, s] 2 }, 

i2G[l...n2-l] 

^^^ne[i...D-i]{Dps{Pi2.Sj^}] 

The Hausdorff distance can then be computed in a Ofnf) 
computational time. 
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Theorem 1. For any trajectories T* and l[T^ 

DFrechet{T\T^) < D Frechet-Discr{{T\T^) < D Frechet{T\T^ )+€ 


2) Frechet and discrete Frechet : The Frechet distance 
measures similarity between curves. It is often known as the 
”walking-dog distance”. Imagine a dog and its owner walking 
on two separate paths without backtracking from one endpoint 
to one other. The Frechet distance is the minimum length 
of leash required to connect a dog and its owner. While the 
Hausdorff distance takes distance between arbitrary point, the 
Frechet metric takes the flow of the two curves into account. 

Definition 9. The Frechet distance between two curves is 
defined as 

DFrechet{A,B) = inf max 11||| 2 |. 

a,pex te[o,i] I ^ ^ J 

As well as the Hausdorff distance, the Frechet distance is 
a metric. It is also resource intensive. Alt et al. (1995(131) 
developed an algorithm measuring the exact Frechet distance 
for polygonal curves based on the free space definition. 

Definition 10. A free space Fg(T^,T^) between two trajecto¬ 
ries is the set of all pairs of points whose distance is at most 

e. 

F,{T\F) := G {T\F)]\\\p\p^h < e}- 

The Frechet distance between two trajectories and 
is the minimum value of e for which a curve exists within the 
corresponding F^ from (po^Po) with the property 

of being monotone in both trajectories. Computing the Frechet 
distance means finding this minimum value of e. By exploiting 
the monotonic property of the segments and the definition of 
the free space, this task can be accomplished more efficiently. 

Indeed, the Frechet distance between segments is equal to 
the Hausdorff distance between segments, i.e. 

D Frechet J = max{ Dps {pj, , J , 

^ps {Pi2 : ) 5 

^ps{Pi2-\-l^ 

^ii,i2 • 

To compute the Frechet distance between trajectories F 
and F , we only look among the set E of Frechet dis¬ 
tances between all pairs of segments of and T‘^. E = 
for (ii,i 2 ) € ([1... - 1] x [1... - 1])}. This 

simplification enables us to compute the Frechet distance 
between trajectories and in 0{n^log{n?)). We highlight 
that this computational cost is higher than all the other distance 
studied. 

Eiter et al 11994 (141 ) describes an approximation of this 
distance for polygonal curves called the discrete Frechet 
distance. This distance is close to the definition of the warping 
based distance. 

Definition 11. The discrete Frechet distance is defined as 

-DFrec/iet-Discr((r\T^) = min{ max ||■^Cfc|| 2 }. 

w ke[i...\w\] 

with W, the warping path defined in definition The 
discrete Frechet distance can be computed in 0{in?) time. 

This distance is bounded as follows. 


IThere, e = max{ max {Ibbfe+ilb},, max {WPiTPi+ih}}■ 

3) One Way Distance: Lin et al. 2005(21 defines the One- 
Way-Distance, OWD, from a trajectory T* to another trajectory 
. It is defined as the integral of the distance from points of 
T^i to trajectory divided by the length of 

Definition 12. The OWD distance is defined as 

DowdF,F) = f [ D^oint{pF)dP, 

^pi 

where Dpoint{p,T) is the distance from the point p to the 
trajectory T so that 

DpointipF) = min \\pq\\ 2 - 

The OWD distance is not symmetric, but 
DsowDiTfT^) = {Dowd{T\T^) F DowD{T^.ff))/2 
is. This distance is a symmetric because it does not satisfy 
the triangle inequality. 

Lin et al. 0, have defined two algorithms to compute the 
OWD in case of piecewise linear trajectories. 

• The first consists in finding the parametrized OWD 
function Dowd{s\^T^) from a segment s\. of to 
all segments of and for all segments of 

n*-l 

Dowd{T\F) = —Ej ^owD{s\,F).\\p\p\j^-y\\, 

^pi k=i 

with a 0{n?log{n)) complexity. 

• The second one uses a grid representation of the trajec¬ 
tory. The space is discrete as we see in Fig. Trajectory 
are defined as the succession of grids they crossed 

Definition 13. A grid representation trajectory is defined 
as 

Tgrid (^ 0 : • • • sQrigrid)’) 

where g^ cire cells of the discrete space. 



Fig. 2: Grid representation of a segment 

This representation simplifies the computation and re¬ 
duces the complexity to 0{nm) where m is the number 
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TABLE V: Shape based distance properties 


Name 

Metric Types 

Computation 

Cost 

Hausdorff 

metric 

O(n^) 

Frechet 

metric 

0(n^log(n^)) 

discrete Frechet 

symmetric 

0(1?) 

OWD 

symmetric 

0{n"^log{n)) 

OWDfirid 

symmetric 

0{mn) 


of local min points. Local min points of a grid cell g are 
the grids with distances to g shorter than those of their 
neighbors’ grid cell. 

Table |V| displays the metric types and the computational 
cost of these distances. 

4) Pros and Cons: 

• Frechet and Hausdorjf distances are both metrics, 
meaning they satisfy triangular inequality. With 
clustering algorithms like dhscan or K-medoid this is 
a necessary property for the distance used if we want 
the clustering algorithm to be efficient. They have been 
widely used in many domains where shape comparison 
is needed. But they can fail to compare trajectories as 
a whole. Indeed both Frechet and Hausdorjf distance 
return a maximum distance between two objects at 
given points of the two objects. As we can see in 
Fig. despite the fact that the trajectories and 
are well separated at the maximum value of x, 
they are clearly more similar to each other than to 
T^. But with Hausdorff calculated distance, there are 
no strong differences between DHaussdorf{T^ 
DHaussdorf{T^ and DHaussdorf{T‘^ With 
Frechet, DFrechet{T^ is even bigger than 

D Frechet and DFrechet{T‘^ 


matching the trajectory to the grid. Moreover, the size 
of the grid chosen strongly infiuences the final result 
and makes it imprecise. Moreover, the distance gives the 
same ’’weight” to all points defining the trajectory: points 
directly issued from the GPS location, and points which 
compose the piece wise linear representation. The greater 
the length of the segment s is, the stronger its infiuences 
on the trajectory is. The more separated the endpoints of a 
segment s is, the less confident the interpolation between 
them is. 

In the following section, a new distance will be established 
inspired from both the OWD and the Hausdorff distances. 


III. A NEW DISTANCE : SYMMETRIZED SEGMENT-PATH 
Distance (SSPD) 

The shape based distances are by far the distances that 


best fit the desired properties defined in section I-C However 
none of them matches it perfectly. Hence, a new distance 
that fits these requirements is provided in this section. The 
Symmetrized Segment-Path Distance. This distance is a shape 
based distance.lt takes into account the whole trajectories, and 
is less affected by noise. 

The distance Dpt from a point p to a trajectory T is the 
minimum of distances between this point and all segments s 
that compose T. The Segment-Path distance from trajectory 
to trajectory is the mean of all distances from points 
composing to the trajectory 

Definition 14. SPD distance is defined as 

-j ni 

Dspd{T\t‘^) = — ^ 


ni 


=1 


where, 


Hausdorff 

Hauss(T^,T^) = ?,.26 
Hauss(T^,T^) ^3.02 
Hauss{T'^,T^) = 3.50 

Frechet 

Frech(T\T^) = 6.00 
Frech{T^ ,T^} = i.W 
Frech{T'^ ,T^) = 4.17 

'^0 2 4 6 8 10 12 

Fig. 3: Frechet And Hausdorff Computation between three 
trajectories 

• The Discrete Frechet distance requires considerably less 
computing time compared to the Frechet distance. But 
Discrete Frechet is not a metric. Moreover, due to its 
similarity with the warping distance it inherits the same 
inconveniences. 

• The distance present in Lin et al. (200510) is by far 
the one that best meets our requirements. It compares 
trajectories as a whole, taking into account their shapes 
and their physical distances, the required features for 
our distance. However, its complexity makes it com¬ 
putationally slow. The algorithm for grid representation 
is faster. Its computational time is 0{mn). Yet it does 
not take into account the computation time required for 


Proposition 1. If is a sub trajectory of T^i, 

Dspd{T\T^)={). 

Proof. If is a sub trajectory of T^i, all points of 
lie within segments that compose T^. By definition 

= 0 Vplii e e TF It follows that 

Dpt{p\-pT‘^) = 0 Vpli, e Ti and finally Dspd{T\F) = 
0 □ 

This distance is not symmetric. If is a very small sub¬ 
trajectory of T^, Dspd{T^,T‘^) = 0, Dspd{T‘^,T^) can 
be very large. By taking the mean of these distances, the 

’’Symmetrized Segment-Path Distance”, SSPD, is defined 
and is symmetric. 

Definition 15. Symmetrized Segment-Path Distance distance 
Dsspd{T\T^) = Dspd{T\F)+Dspd{T^F) ^ 

In definitions [T^ and [T^ distances SPD and SSPD are 
computed by taking the mean of the Point-to-Trajectory dis¬ 
tance and the SPD distance. If the maximum is used instead 
of the mean, one recovers the Hausdorff function between 
two trajectories. Computing only one distance between two 
locations makes it very sensitive to noise. Yet our method 
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computes the mean of such quantities which makes it less 
sensitive to this noise. For example, for the trajectories in Fig. 
[sj the SSPD distance between and is lower than the 


distance between and or and = 

0.58,L)(T\t3) = 1.5,L)(T2,T3) = 2.03). 

Proposition 2. SSDP is a symmetric. 

Proof. SSDP is a sum of Euclidean distances. By definition 
SSDP is greater or equal to 0. By definition SSDP is 
symmetric. Finally theoremnsays that, if Dsdp{^^ T‘^) = 0, 
is a sub trajectory of T. Therefore if Dssdp{T^ ,T‘^) = 
0, both Dsdp{T^.T^) = 0 and Dsdp{T^.T^) = 0, and 
rj^i SSDP is then a symmetric. □ 

SSDP is quite similar to OWD but its definition resolves 
most of the problems of OWD regarding the desired properties 
defined in II-CI 

• The points coming from the interpolation of two observed 
locations of a trajectory are less trustworthy that the 
real observations. Hence, it is natural to strengthen the 
importance of the observed points. 

• SSPD distance does not require any additional parameters 
such as a threshold or a grid to be computed. 

• Its computation cost is 0{in?). It only depends on the 
number of locations. 


IV. Clustering 

To evaluate these different distances, we will study different 
clustering obtained with the same algorithm but with distances 
computed using all previous distances. The different selected 
clustering methods and the quality of cluster criterion are 
exposed in this section. 


A. Methods 

The choice of the clustering method is restricted by the 
characteristics of the trajectory object. Indeed, trajectories have 
different lengths which prevents an easy definition of a mean 
trajectory object. The k-means method cannot be used on our 
trajectory set, nor spectral clustering methods, k-medoid can 
be used but an efficient algorithm, like partitioning around 
medoids, or dbscan method, require a valid metrics. Indeed, 
these algorithms are based on nearest neighbor and require 
the distance used to satisfy the triangular inequality. Most of 
the studied distances, SSPD, LOSS, DTW, are not metrics. In 
this way, dbscan or partitioning around medoids algorithms 
will not be used. Moreover, dbscan depends on two extra 
parameters that are hard to estimate in this case. 

To perform the clustering of the trajectories, we will focus 
on two methodologies : hierarchical cluster analysis (HCA) 
and affinity propagation (AP). As a matter of fact, HCA 
and AP can use distance/similarity which does not satisfy 
the triangle inequality. We point out that the choice of the 
clustering method is restricted to the trajectory object we deal 
with. Actually, trajectories have different lengths. HCA and AP 
are both methods which only require the distance/similarity 
matrix, and thus can cluster objects of different lengths. Both 
these methods will be used to evaluate our distance. 



B. Quality criterion of cluster result 

A clustering algorithm aims at gathering objects into ho¬ 
mogeneous groups that are far one from another. Hence, the 
optimal number of cluster is usually selecting by looking at 
the between and within variance of the obtained clusters. In 
this particular case, they can not be computed here because 
of the impossibility to compute the mean of the trajectory 
object. Yet, we approximate this mean by considering an 
exemplar of a set of a trajectory T of length n^, defined 
nT 

as Tf* = min | D(T^ T^')|. 

ie[0...n'^] j=i 

Let Cl,..., Cx be a set of clusters of T. Hence, the between 
and within variance are replace by the Between-Like and the 
variance-like. 

Definition 16. Between-Like and Within-Like 

K 

BC = 

k=l 


wc = Y^ 

k=l 


1 

|Cfe| 


^ D{T^IX)- 

T^eCk 


The Within-Like criterion shows the spread of elements 
belonging to the same cluster while the Between-Like criterion 
shows the spread between clusters. As for the variance, for a 
given number of clusters, we want the Within-Like criterion 
to be as small as possible, and the Between-Like criterion to 
be as big as possible. 
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V. Experimental evaluation 

In this section, we evaluate and compare 6 distances LOSS, 
DTW, Hausdorff, Frechet, Frechet Approximation, and the 
SSDP. All this distance have been implemented in both 
python and cython and are available in the traj-dist package 
(https ://github. com/bguillouet/traj -dist). 

We also use python for the implementation of the chosen 
clustering algorithms, the sklearn library for affinity propaga¬ 
tion and scipy library for hierarchical clustering analysis. For 
the latter, weighted, average, ward and single linkage criteria 
have been compared. 

A. The Data 

The data we used are GPS data from 536 San-Francisco 
taxis over a 24-day period. These data are public and can be 
found in ca. We extracted a subset of this set as shown Fig. 

m 

This subset is a blend of 2802 trajectories. They all have the 
same pickup location, the Caltrain station, and all have their 
drop-off location in downtown San-Francisco. 


B. Computation cost 


In Table we can observe the computation time needed 
to compute the matrix distance for 100 trajectories composed 
of between 3 and 36 locations, most having around 10. 


TABLE VI: Computation Time in seconds 


Distance 

Python 

Cython 

Frechet 

131.76 

36.32 

Discrete Frechet 

3.67 

2.24 

Hausdorff 

13.36 

0.28 

DTW 

3.63 

0.40 

LCSS 

2.79 

0.60 

SSPD 

13.20 

0.32 


Frechet distance is the distance that takes most computation 
time. It is the only method that runs in 0{n?log{n^)). With 
python, DTW, LCSS and Discrete Frechet distances are the 
fastest methods, while Hausdorff and SSPD are the fastest 
with cython because of its ability to declare static variables and 
to use the C math library. DTW, LCSS and Discrete Frechet 
each have a backtracking step which is not improved with 
the cython implementation. This explains the faster computing 
time for Hausdorff and SSPD. 


C. Analysis of the number of cluster selection 

In Fig. we can observe the evolution of the within- and the 
between- like criterion described section [Iv] for the distance 
SSPD and for the selected methods AP and CAH. Both the 
Between-Like and the Within-Like criterion are displayed 
because the sum of these two criteria is not constant as 
opposed to the sum of the between and within variance. 
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Fig. 7: Evolution of the Within-Like and Between-Like criteria 
depending on cluster. 
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Lig. 9: Evolution of the Within-Like and Between-Like criteria 
depending on the cluster size for all distances using the AP 
method 


The CAH single method gives poor results. All other meth¬ 
ods have the same evolution of the studied criterion depending 
on the cluster size. A plateau can be observed starting from 
a clusters size between 15 and 20. Adding more cluster does 
not decrease significantly the Within-Like Criterion. Twenty 
is a good cluster size for the CAH method. 

CAH Ward and AP give the best results. But the latter does 
not find any clustering with less than 38 clusters which is a 
too large cluster size. 

The same conclusions can be made with the six studied 
distances. 

The CAH Ward method with cluster size of 20 and the AP 
method with the preference parameter fixed to the minimum 
of the computed matrix distance will be used to compare the 
studied distances in more details. 


D. Analysis of the distances 

We can observe the evolution of the Within-Like and the 
Between-Like criteria for the two selected clustering methods 
as well as for all studied distances. The CAH WARD results 
are display in Lig. and the AP results in Lig. 


Between-Like Criterion{%) 
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Number Of Cluster 


The minimum of cluster size found by the AP method differs 
significantly according to the used distance. No more thant 21 
clusters are found with the DTW distance, and 35 with SSPD 
or 54 with Hausdorff. 

The Warping-based distances, LCSS and DTW, give the 
poorest results with LCSS being significantly worse than DTW. 
The two shape-based distances Frechet and Hausdorff give 
better results. The evolution of their criteria is very similar 
to each other. The Discrete Frechet distance is between these 
two types of distances. These results confirm that shape-based 
distances are better adapted than warping-based distances for 
our objectives. 

Linally, the new distance SSDP gives the best results. It has 
the lowest value of Within-Like Criterion for all cluster sizes 
and with both CAH WARD and AP clustering methods. 

We can observe the visual results for this distance and both 
clustering methods, in Lig. [TOj and the isolated clusters, in 
Fig.[TT] 



Lig. 10: Clustering results with SSPD distance 


Lig. 8: Evolution of the Within-Like and Between-Like criteria 
depending on cluster size for all distances using the CAH- 
WARD method 


We can observe that trajectories are well classified according 
to their path. In Lig. m clusters found with CAH WARD 
seems to be consistent. The cluster size with AP method is 38. 
This is a large number according to the whithin-Like criterion 
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computed with CAH. In fact, the Within-Like criterion does 
not decrease much between 20 and 35. However, we can see 
that clusters found with AP are still consistent. 
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Fig. 11: The isolated clusters 


A cluster computed with the CAH WARD method based on 
a matrix distance computed with SSPD gives best result. The 
Between-Like and Within-Like criteria show that this method 
is good to regroup cluster around exemplar. 


Conclusion 

Clustering of non Euclidean objects deeply relies of the 
choice of a proper distance. For trajectories analysis, we 
presented different distances focusing on different features of 
such objects. To cope with their different weakness we propose 
a new distance, the Symmetrized Segment-Path Distance. This 
distance is time insensitive, and compares the shape and the 
physical distance between two trajectory objects. It enables to 
obtain a good clustering using either hierarchical clustering 
and affinity propagation methods. Hence the clusters obtained 
are homogeneous with regard to shape and seem to properly 
capture the behaviours of the drivers. We have thus obtained 
a partition of the network based on the uses of the drivers 
that can still be interpreted as vehicles trajectories. Using such 
features to forecast the final destination of the drivers will be 
tackled in a following work. 
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